PanCGH: a genotype-calling algorithm for pangenome CGH data
نویسندگان
چکیده
MOTIVATION Pangenome arrays contain DNA oligomers targeting several sequenced reference genomes from the same species. In microbiology, these can be employed to investigate the often high genetic variability within a species by comparative genome hybridization (CGH). The biological interpretation of pangenome CGH data depends on the ability to compare strains at a functional level, particularly by comparing the presence or absence of orthologous genes. Due to the high genetic variability, available genotype-calling algorithms can not be applied to pangenome CGH data. RESULTS We have developed the algorithm PanCGH that incorporates orthology information about genes to predict the presence or absence of orthologous genes in a query organism using CGH arrays that target the genomes of sequenced representatives of a group of microorganisms. PanCGH was tested and applied in the analysis of genetic diversity among 39 Lactococcus lactis strains from three different subspecies (lactis.cremoris, hordniae) and isolated from two different niches (dairy and plant). Clustering of these strains using the presence/absence data of gene orthologs revealed a clear separation between different subspecies and reflected the niche of the strains.
منابع مشابه
PanCGHweb: a web tool for genotype calling in pangenome CGH data
UNLABELLED A pangenome is the total of genes present in strains of the same species. Pangenome microarrays allow determining the genomic content of bacterial strains more accurately than conventional comparative genome hybridization microarrays. PanCGHweb is the first tool that effectively calls genotype based on pangenome microarray data. AVAILABILITY PanCGHweb, the web tool is accessible fr...
متن کاملFACADE: a fast and sensitive algorithm for the segmentation and calling of high resolution array CGH data
The availability of high resolution array comparative genomic hybridization (CGH) platforms has led to increasing complexities in data analysis. Specifically, defining contiguous regions of alterations or segmentation can be computationally intensive and popular algorithms can take hours to days for the processing of arrays comprised of hundreds of thousands to millions of elements. Additionall...
متن کاملCGHcall: calling aberrations for array CGH tumor profiles
UNLABELLED CGHcall achieves high calling accuracy for array CGH data by effective use of breakpoint information from segmentation and by inclusion of several biological concepts that are ignored by existing algorithms. The algorithm is validated for simulated and verified real array CGH data. By incorporating more than three classes, CGHcall improves detection of single copy gains and amplifica...
متن کاملSpatial smoothing and hot spot detection for CGH data using the fused lasso.
We apply the "fused lasso" regression method of (TSRZ2004) to the problem of "hot- spot detection", in particular, detection of regions of gain or loss in comparative genomic hybridization (CGH) data. The fused lasso criterion leads to a convex optimization problem, and we provide a fast algorithm for its solution. Estimates of false-discovery rate are also provided. Our studies show that the n...
متن کاملJoint segmentation, calling, and normalization of multiple CGH profiles.
The statistical analysis of array comparative genomic hybridization (CGH) data has now shifted to the joint assessment of copy number variations at the cohort level. Considering multiple profiles gives the opportunity to correct for systematic biases observed on single profiles, such as probe GC content or the so-called "wave effect." In this article, we extend the segmentation model developed ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Bioinformatics
دوره 25 شماره
صفحات -
تاریخ انتشار 2009